Automatic Multilingual Indexing and Natural Language Processing

نویسندگان

  • Bärbel Ripplinger
  • Paul Schmidt
چکیده

The number of documents being collected by information brokers such as bibliographic database producers, libraries and publishers increases rapidly. The consequence is a huge demand for indexing and classification. So far this has had to be carried out manually. The system AUTINDEX, which is described in this paper offers tools for monolingual as well as for multilingual automatic indexing and classification by taking advantage of sophisticated language processing technologies and already existing special purpose language resources such as thesauri, classification schemes and large lexicons. It will be shown that the use of high quality NLP can achieve appropriate results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing.   This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...

متن کامل

Combining CBIR and NLP for Multilingual Terminology Alignment and Cross-Language Image Indexing

In this paper, an overview of an approach for cross-language image indexing and multilingual terminology alignment is presented. Content-Based Image Retrieval (CBIR) is proposed as a means to find similar images in target language documents in the web and natural language processing is used to reduce the search space and find the image index. As the experiments are carried out in specialized do...

متن کامل

Automatic Multilingual Indexing and Classification

Most of today's published scientific and technical articles are written in English. Therefore, the number of English documents being collected by information brokers such as bibliographic database producers, libraries and publishers increases rapidly. However, there will still be a number of documents only available in the native language of the author. One method to facilitate access to this i...

متن کامل

Searching and Summarizing in a Multilingual Environment

Multilingual aspects have been gaining more and more attention in recent years. This trend has been accentuated by the global integration of European states and the vanishing cultural and social boundaries. The ever increasing use of foreign languages is due to the information boom caused by the emergence of easy internet access. Multilingual text processing has become an important field bringi...

متن کامل

Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine

This paper presents the ITEM multilingual search engine. This search engine performs full lexical processing (morphological analysis, tagging and Word Sense Disambiguation) on documents and queries in order to provide language-neutral indexes for querying and retrieval. The indexing terms are the EuroWordNet/ITEM InterLingual Index records that link wordnets in 10 languages of the European Comm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001